--- /dev/null
+General ideas of Pixops
+=======================
+
+ - Gain speed by special-casing the common case, and using
+ generic code to handle the uncommon case.
+
+ - Most of the time in scaling an image is in the center;
+ however code that can handle edges properly is slow
+ because it needs to deal with the possibility of running
+ off the edge. So make the fast case code only handle
+ the centers, and use generic, slow, code for the edges,
+
+Structure of Pixops
+===================
+
+The code of pixops can roughly be grouped into four parts:
+
+ - Filter computation functions
+
+ - Functions for scaling or compositing lines and pixels
+ using precomputed filters
+
+ - pixops process, the central driver that iterates through
+ the image calling pixel or line functions as necessary
+
+ - Wrapper functions (pixops_scale/composite/composite_color)
+ that compute the filter, chooses the line and pixel functions
+ and then call pixops_processs with the filter, line,
+ and pixel functions.
+
+
+pixops process is a pretty scary looking function:
+
+static void
+pixops_process (guchar *dest_buf,
+ int render_x0,
+ int render_y0,
+ int render_x1,
+ int render_y1,
+ int dest_rowstride,
+ int dest_channels,
+ gboolean dest_has_alpha,
+ const guchar *src_buf,
+ int src_width,
+ int src_height,
+ int src_rowstride,
+ int src_channels,
+ gboolean src_has_alpha,
+ double scale_x,
+ double scale_y,
+ int check_x,
+ int check_y,
+ int check_size,
+ guint32 color1,
+ guint32 color2,
+ PixopsFilter *filter,
+ PixopsLineFunc line_func,
+ PixopsPixelFunc pixel_func)
+
+(Some of the arguments should be moved into structures. It's basically
+"all the arguments to pixops_composite_color plus three more") The
+arguments can be divided up into:
+
+
+Information about the destination buffer
+
+ guchar *dest_buf, int dest_rowstride, int dest_channels, gboolean dest_has_alpha,
+
+Information about the source buffer
+
+ guchar *src_buf, int src_rowstride, int src_channels, gboolean src_has_alpha,
+ int src_width, int src_height,
+
+Information on how to scale the source buf and the region of the scaled source
+to render onto the destination buffer
+
+ int render_x0, int render_y0, int render_x1, int render_y1
+ double scale_x, double scale_y
+
+Information about a constant color or check pattern onto which to to composite
+
+ int check_x, int check_y, int check_size, guint32 color1, guint32 color2
+
+Information precomputed to use during the scale operation
+
+ PixopsFilter *filter, PixopsLineFunc line_func, OixopsPixelFunc pixel_func
+
+
+Filter computation
+==================
+
+The PixopsFilter structure looks like:
+
+struct _PixopsFilter
+{
+ int *weights;
+ int n_x;
+ int n_y;
+ double x_offset;
+ double y_offset;
+};
+
+
+'weights' is an array of size:
+
+ weights[SUBSAMPLE][SUBSAMPLE][n_x][n_y]
+
+SUBSAMPLE is a constant - currently 16 in pixops.c.
+
+
+In order to compute a scaled destination pixel we convolve
+an array of n_x by n_y source pixels with one of
+the SUBSAMPLE * SUBSAMPLE filter matrices stored
+in weights. The choice of filter matrix is determined
+by the fractional part of the source location.
+
+To compute dest[i,j] we do the following:
+
+ x = i * scale_x + x_offset;
+ y = i * scale_x + y_offset;
+ x_int = floor(x)
+ y_int = floor(y)
+
+ C = weights[SUBSAMPLE*(x - x_int)][SUBSAMPLE*(y - y_int)]
+ total = sum[l=0..n_x-1, j=0..n_y-1] (C[l,m] * src[x_int + l, x_int + m])
+
+The filter weights are integers scaled so that the total of the
+weights in the weights array is equal to 65536.
+
+When the source does not have alpha, we simply compute each channel
+as above, so total is in the range [0,255*65536]
+
+ dest = src / 65536
+
+When the source does have alpha, then we need to compute using
+"pre-multiplied alpha":
+
+ a_total = sum (C[l,m] * src_a[x_int + l, x_int + m])
+ c_total = sum (C[l,m] * src_a[x_int + l, x_int + m] * src_c[x_int + l, x_int + m])
+
+This gives us a result for c_total in the range of [0,255*a_total]
+
+ c_dest = c_total / a_total
+
+
+Mathematical aside:
+
+The process of producing a destination filter consists
+of:
+
+ - Producing a continuous approximation to the source
+ image via interpolation.
+
+ - Sampling that continuous approximation with filter.
+
+This is representable as:
+
+ S(x,y) = sum[i=-inf,inf; j=-inf,inf] A(frac(x),frac(y))[i,j] * S[floor(x)+i,floor(y)+j]
+
+ D[i,j] = Integral(s=-inf,inf; t=-inf,inf) B(i+x,j+y) S((i+x)/scale_x,(i+y)/scale_y)
+
+By reordering the sums and integrals, you get something of the form:
+
+ D[i,j] = sum[l=-inf,inf; m=-inf;inf] C[l,m] S[i+l,j+l]
+
+The arrays in weights are the C[l,m] above, and are thus
+determined by the interpolating algorithm in use and the
+sampling filter:
+
+ INTERPOLATE SAMPLE
+ ART_FILTER_NEAREST nearest neighbour point
+ ART_FILTER_TILES nearest neighbour box
+ ART_FILTER_BILINEAR (scale < 1) nearest neighbour box (scale < 1)
+ ART_FILTER_BILINEAR (scale > 1) bilinear point (scale > 1)
+ ART_FILTER_HYPER bilinear box
+
+
+Pixel Functions
+===============
+
+typedef void (*PixopsPixelFunc) (guchar *dest, int dest_x, int dest_channels, int dest_has_alpha,
+ int src_has_alpha,
+ int check_size, guint32 color1, guint32 color2,
+ int r, int g, int b, int a);
+
+The arguments here are:
+
+ dest: location to store the output pixel
+ dest_x: x coordinate of destination (for handling checks)
+ dest_has_alpha, dest_channels: Information about the destination pixbuf
+ src_has_alpha: Information about the source pixbuf
+
+ check_size, color1, color2: Information for color background for composite_color variant
+
+ r,g,b,a - scaled red, green, blue and alpha
+
+r,g,b are premultiplied alpha.
+
+ a is in [0,65536*255]
+ r is in [0,255*a]
+ g is in [0,255*a]
+ b is in [0,255*a]
+
+If src_has_alpha is false, then a will be 65536*255, allowing optimization.
+
+
+Line functions
+==============
+
+typedef guchar *(*PixopsLineFunc) (int *weights, int n_x, int n_y,
+ guchar *dest, int dest_x, guchar *dest_end, int dest_channels, int dest_has_alpha,
+ guchar **src, int src_channels, gboolean src_has_alpha,
+ int x_init, int x_step, int src_width,
+ int check_size, guint32 color1, guint32 color2);
+
+The argumets are:
+
+ weights, n_x, n_y
+
+ Filter weights for this row - dimensions weights[SUBSAMPLE][n_x][n_y]
+
+ dest, dest_x, dest_end, dest_channels, dest_has_alpha
+
+ The destination buffer, function will start writing into *dest and
+ increment by dest_channels, until dest == dest_end. Reading from
+ src for these pixels is guaranteed not to go outside of the
+ bufer bounds
+
+ src, src_channels, src_has_alpha
+
+ src[n_y] - an array of pointers to the start of the source rows
+ for each filter coordinate.
+
+ x_init, x_step
+
+ Information about x positions in source image.
+
+ src_width - unused
+
+ check_size, color1, color2: Information for color background for composite_color variant
+
+ The total for the destination pixel at dest + i is given by
+
+ SUM (l=0..n_x - 1, m=0..n_y - 1)
+ src[m][(x_init + i * x_step)>> SCALE_SHIFT + l] * weights[m][l]
+
+
+Algorithms for compositing
+==========================
+
+Compositing alpha on non alpha:
+
+ R = As * Rs + (1 - As) * Rd
+ G = As * Gs + (1 - As) * Gd
+ B = As * Bs + (1 - As) * Bd
+
+This can be regrouped as:
+
+ Cd + Cs * (Cs - Rd)
+
+Compositing alpha on alpha:
+
+ A = As + (1 - As) * Ad
+ R = (As * Rs + (1 - As) * Rd * Ad) / A
+ G = (As * Gs + (1 - As) * Gd * Ad) / A
+ B = (As * Bs + (1 - As) * Bd * Ad) / A
+
+The way to think of this is in terms of the "area":
+
+The final pixel is composed of area As of the source pixel
+and (1 - As) * Ad of the target pixel. So the final pixel
+is a weighted average with those weights.
+
+Note that the weights do not add up to one - hence the
+non-constant division.
+
+
+Integer tricks for compositing
+==============================
+
+
+
+MMX Code
+========
+
+Line functions are provided in MMX functionsfor a few special
+cases:
+
+ n_x = n_y = 2
+
+ src_channels = 3 dest_channels = 3 op = scale
+ src_channels = 4 with alpha dest_channels = 4 no alpha op = composite
+ src_channels = 4 with alpha dest_channels = 4 no alpha op = composite_color
+
+For the case n_x = n_y = 2 - primarily hit when scaling up with bilinear
+scaling, we can take advantage of the fact that multiple destination
+pixels will be composed from the same source pixels.
+
+That is a destination pixel is a linear combination of the source
+pixels around it:
+
+
+ S0 S1
+
+
+
+
+
+ D D' D'' ...
+
+
+
+
+ S2 S3
+
+Each mmx register is 64 bits wide, so we can unpack a source pixel
+into the low 8 bits of 4 16 bit words, and store it into a mmx
+register.
+
+For each destination pixel, we first make sure that we have pixels S0
+... S3 loaded into registers mm0 ...mm3. (This will often involve not
+doing anything or moving mm1 and mm3 into mm0 and mm1 then reloading
+mm1 and mm3 with new values).
+
+Then we load up the appropriate weights for the 4 corner pixels
+based on the offsets of the destination pixel within the source
+pixels.
+
+We have preexpanded the weights to 64 bits wide and truncated the
+range to 8 bits, so an original filter value of
+
+ 0x5321 would be expanded to
+
+ 0x0053005300530053
+
+For source buffers without alpha, we simply do a multiply-add
+of the weights, giving us a 16 bit quantity for the result
+that we shift left by 8 and store in the destination buffer.
+
+When the source buffer has alpha, then things become more
+complicated - when we load up mm0 and mm3, we premultiply
+the alpha, so they contain:
+
+ (a*ff >> 8) (r*a >> 8) (g*a >> 8) (b*a >> a)
+
+Then when we multiply by the weights, and add we end up
+with premultiplied r,g,b,a in the range of 0 .. 0xff * 0ff,
+call them A,R,G,B
+
+We then need to composite with the dest pixels - which
+we do by:
+
+ r_dest = (R + ((0xff * 0xff - A) >> 8) * r_dest) >> 8
+
+(0xff * 0xff)